In this dashboard, we summarize the information from the data provided by Ohio Department of Health.
In this data set, there are 8 variables.
Hospitalized Cases in Healthcare Zones
Death Cases in Healthcare Zones
Source: cleveland.com.
Today: April 05, 2020
The latest onset date is April 04, 2020.
Last Updated: Sun Apr 05 19:32:35 2020 EST
We excluded 3 people whose age is unknown.
We excluded 3 people whose age is unknown.
---
title: "Ohio COVID-19"
author: "Ying-Ju Tessa Chen"
output:
flexdashboard::flex_dashboard:
theme: journal
orientation: columns
social: ["facebook", "twitter", "linkedin"]
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard) ## you need this package to create dashboard
```
Basic Information
=======================================================================
Column {data-width=400}
---
### Introduction
In this dashboard, we summarize the information from the data provided by Ohio Department of Health.
In this data set, there are 8 variables.
- **County**: 88 counties
- **Sex**: Female, Male, Unknown
- **Age Range**: 0-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+, Unknown
- **Onset Data**
- **Date of Death**
- **Case Count**
- **Death Count**
- **Hospitalized Count**
```{r}
# load necessary packages
library(data.table)
library(ggplot2)
library(ggmap)
library(plotly)
library(plyr)
library(chron)
library(Hmisc)
library(stringr)
library(sp)
```
```{r}
df <- fread("https://coronavirus.ohio.gov/static/COVIDSummaryData.csv")
colnames(df) <- c("County", "Sex", "Age_Range", "Onset_Date",
"Date_Of_Death", "Case_Count",
"Death_Count", "Hospitalized_Count")
# remove the last row that shows the total count and make sure the type of each variable is correct
df <- as.data.frame(df[1:(nrow(df)-1),])
df[,1:3] <- lapply(df[,1:3], factor)
df[,4:5] <- lapply(df[,4:5], function(x) as.Date(x, "%m/%d/%Y"))
df[,6:8] <- lapply(df[,6:8], as.numeric)
R1 <- c("Williams", "Defiance", "Paulding", "Van Wert",
"Mercer", "Fulton", "Henry", "Putnam", "Allen",
"Auglaize", "Lucas", "Wood", "Hancock", "Ottawa",
"Sandusky", "Seneca", "Erie", "Huron")
R2 <- c("Lorain", "Cuyahoga", "Geauga", "Lake", "Ashtabula")
R3 <- c("Darke", "Preble", "Shelby", "Miami", "Montgomery",
"Champaign", "Clark", "Greene")
R4 <- c("Crawford", "Delaware", "Fairfield", "Fayette",
"Franklin", "Hardin", "Knox", "Licking", "Logan",
"Madison", "Marion", "Morrow", "Pickaway", "Union", "Wyandot")
R5 <- c("Richland", "Ashland", "Medina", "Wayne", "Holmes",
"Summit", "Stark", "Tuscarawas", "Trumbull", "Portage",
"Mahoning", "Columbiana", "Carroll")
R6 <- c("Butler", "Hamilton", "Warren", "Clermont",
"Clinton", "Highland", "Brown", "Adams")
R7 <- c("Ross", "Pike", "Scioto", "Hocking", "Vinton",
"Jackson", "Lawrence", "Athens", "Meigs", "Gallia")
R8 <- c("Coshocton", "Muskingum", "Perry", "Morgan",
"Guernsey", "Noble", "Washington", "Harrison",
"Belmont", "Monroe", "Jefferson")
Zone1 <- c(R1, R2, R5)
Zone2 <- c(R4, R7, R8)
Zone3 <- c(R3, R6)
df$Zones <- c(NA)
df$Zones <- ifelse(df$County%in%Zone1, 1, df$Zones)
df$Zones <- ifelse(df$County%in%Zone2, 2, df$Zones)
df$Zones <- ifelse(df$County%in%Zone3, 3, df$Zones)
Hospitalized_Zones <- table(df$Zones, df$Hospitalized_Count)
Hospitalized_Zones_Cases <- apply(Hospitalized_Zones, 1, function(x) sum(x*as.numeric(colnames(Hospitalized_Zones))))
Cases_Zones <- table(df$Zones, df$Case_Count)
Cases_Zones_all <- apply(Cases_Zones, 1, function(x) sum(x*as.numeric(colnames(Cases_Zones))))
Death_Zones <- table(df$Zones, df$Death_Count)
Death_Zones_Cases <- apply(Death_Zones, 1, function(x) sum(x*as.numeric(colnames(Death_Zones))))
```
\
**Hospitalized Cases in Healthcare Zones**
- **Zone 1 (Cleveland Area):** `r unname(Hospitalized_Zones_Cases[1])` Cases (`r round(Hospitalized_Zones_Cases[1]/Cases_Zones_all[1]*100, 2)`%)
- **Zone 2 (Columbus Area):** `r unname(Hospitalized_Zones_Cases[2])` Cases (`r round(Hospitalized_Zones_Cases[2]/Cases_Zones_all[2]*100, 2)`%)
- **Zone 3 (Cincinnati/Dayton):** `r unname(Hospitalized_Zones_Cases[3])` Cases (`r round(Hospitalized_Zones_Cases[3]/Cases_Zones_all[3]*100, 2)`%)
\
**Death Cases in Healthcare Zones**
- **Zone 1 (Cleveland Area):** `r unname(Death_Zones_Cases[1])` Cases (`r round(Death_Zones_Cases[1]/Cases_Zones_all[1]*100, 2)`%)
- **Zone 2 (Columbus Area):** `r unname(Death_Zones_Cases[2])` Cases (`r round(Death_Zones_Cases[2]/Cases_Zones_all[2]*100, 2)`%)
- **Zone 3 (Cincinnati/Dayton):** `r unname(Death_Zones_Cases[3])` Cases (`r round(Death_Zones_Cases[3]/Cases_Zones_all[3]*100, 2)`%)
\
**Source:** cleveland.com.
\
Column {data-width=600}
---
```{r}
all_dates <- names(table(df$Onset_Date))
latest_date <- sort(df$Onset_Date, decreasing = TRUE)[1]
```
### Summary Statistics
**Today: `r format(Sys.Date(), "%B %d, %Y")`**
**The latest onset date is `r format(latest_date, "%B %d, %Y")`.**
- Total Number of **Confirmed Cases**: `r sum(df$Case_Count)`
- Total Number of **Hospitalizations**: `r sum(df$Hospitalized_Count)`
- Total Number of **Deaths**: `r sum(df$Death_Count)`
- **Death Rate in Ohio**: `r paste0(round(sum(df$Death_Count)/sum(df$Case_Count)*100, 2), "%")`
**Last Updated:** `r date()` EST
### Distribution of Confirmed Cases by Age
```{r}
AGE_summary <- table(df$Age_Range)
AGE_count <- as.vector(unname(AGE_summary))
AGE <- data.frame(age=AGE_count, percent=paste0(round(AGE_count/sum(AGE_count)*100, 2), "%"))
rownames(AGE) <- names(AGE_summary)
colnames(AGE) <- c("Count", "Percent")
DT::datatable(t(AGE), options = list(
columnDefs = list(list(className = 'dt-center', targets = 0:nrow(AGE)))
))
```
### Distribution of Confirmed Cases by Sex
```{r}
Sex_summary <- table(df$Sex)
Sex_count <- as.vector(unname(Sex_summary))
SEX <- data.frame(sex=Sex_count, percent=paste0(round(Sex_count/sum(Sex_count)*100, 2), "%"))
rownames(SEX) <- names(Sex_summary)
colnames(SEX) <- c("Count", "Percent")
DT::datatable(t(SEX), options = list(
columnDefs = list(list(className = 'dt-center', targets = 0:nrow(SEX)))
))
```
Daily Cases
=======================================================================
Column {.tabset data-width=500}
-----------------------------------------------------------------------
```{r}
date_sum <- table(df$Onset_Date, df$Case_Count)
daily_cases <- apply(date_sum, 1, function(x) sum(x*as.numeric(colnames(date_sum))))
monthly <- data.frame(dates=as.Date(all_dates, "%Y-%m-%d"), cases=daily_cases)
rownames(monthly) <- c()
cal <- function(month, year) {
if(missing(year) && missing(month)) {
tmp <- month.day.year(Sys.Date())
year <- tmp$year
month <- tmp$month
}
if(missing(year) || missing(month)){ # year calendar
if(missing(year)) year <- month
par(mfrow=c(4,3))
tmp <- seq.dates( from=julian(1,1,year), to=julian(12,31,year) )
tmp2 <- month.day.year(tmp)
wd <- do.call(day.of.week, tmp2)
par(mar=c(1.5,1.5,2.5,1.5))
for(i in 1:12){
w <- tmp2$month == i
cs <- cumsum(wd[w]==0)
if(cs[1] > 0) cs <- cs - 1
nr <- max( cs ) + 1
plot.new()
plot.window( xlim=c(0,6), ylim=c(0,nr+1) )
text( wd[w], nr - cs -0.5 , tmp2$day[w] )
title( main=month.name[i] )
text( 0:6, nr+0.5, c('S','M','T','W','T','F','S') )
}
} else { # month calendar
ld <- seq.dates( from=julian(month,1,year), length=2, by='months')[2]-1
days <- seq.dates( from=julian(month,1,year), to=ld)
tmp <- month.day.year(days)
wd <- do.call(day.of.week, tmp)
cs <- cumsum(wd == 0)
if(cs[1] > 0) cs <- cs - 1
nr <- max(cs) + 1
par(oma=c(0.1,0.1,4.6,0.1))
par(mfrow=c(nr,7))
par(mar=c(0,0,0,0))
for(i in seq_len(wd[1])){
plot.new()
#box()
}
day.name <- c('Sun','Mon','Tues','Wed','Thur','Fri','Sat')
for(i in tmp$day){
plot.new()
box()
text(0,1, i, adj=c(0,1))
if(i < 8) mtext( day.name[wd[i]+1], line=0.5,
at=grconvertX(0.5,to='ndc'), outer=TRUE )
}
mtext(month.name[month], line=2.5, at=0.5, cex=1.75, outer=TRUE)
#box('inner') #optional
}
}
week_days <- function(x){
days <- c(1:7)
names(days) <- c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
days_index <- which(names(days)==x)
return(unname(days[days_index]))
}
```
```{r , message=FALSE, echo=FALSE, cache=TRUE, error=FALSE, results='asis'}
for (i in month(latest_date):2){
df_m <- monthly[which(month(monthly$dates)==i),]
first_day <- weekdays(as.Date(paste0("2020-", i, "-01"), "%Y-%m-%d"))
C_matrix <- matrix(NA, ncol=3, nrow=monthDays(as.Date(paste0("2020-", i, "-01"))))
total_days <- week_days(first_day):(week_days(first_day)+monthDays(as.Date(paste0("2020-", i, "-01")))-1)
C_matrix[,1] <- ceiling(total_days/7)
C_matrix[,2] <- total_days%%7
C_matrix[,2] <- ifelse(C_matrix[,2]==0, 7, C_matrix[,2])
for (j in 1:nrow(df_m)){
C_matrix[mday(df_m$dates[j]),3] <- df_m$cases[j]
}
cat('### ', month.abb[i],' \n')
cal(i, 2020)
for (k in mday(df_m$dates)){
par(mfg=C_matrix[k,1:2])
text(.5, .5, as.character(C_matrix[k,3]), cex=2)
}
cat('\n \n')
}
```
Column {data-width=500}
-----------------------------------------------------------------------
### Distribution of Daily Cases
```{r}
D <- data.frame(Dates=names(daily_cases), cases=unname(daily_cases))
p_dates <- plot_ly(D, x=~Dates, y=~cases, type="bar", text=as.character(cumsum(daily_cases)), name="",
hovertemplate = paste('%{x}', '
Daily Cases: %{y:s}
',
'Total Cases: %{text:s}'))
p_dates <- p_dates %>% layout(uniformtext=list(minsize=8,mode='hide')) %>% config(displayModeBar = F)
p_dates
```
Distribution by Counties
=======================================================================
```{r}
county_cases <- table(df$County, df$Case_Count)
county_cases_all <- apply(county_cases, 1, function(x) sum(x*as.numeric(colnames(county_cases))))
df_ohio_cases <- data.frame(county=names(county_cases_all), count=county_cases_all)
rownames(df_ohio_cases) <- c()
usa <- map_data("county") # get basic map data for all USA counties
oh <- subset(usa, region == "ohio") # subset to counties in Ohio
oh$county = str_to_title(oh$subregion)
my.df = merge(oh, df_ohio_cases, by = "county", all.x = TRUE,
sort = FALSE)
#my.df$count <- ifelse(is.na(my.df$count), 0, my.df$count)
my.df = my.df[order(my.df$order), ]
getLabelPoint <- # Returns a county-named list of label points
function(county) {Polygon(county[c('long', 'lat')])@labpt}
centroids = by(oh, oh$county, getLabelPoint) # Returns list
centroids2 <- do.call("rbind.data.frame", centroids) # Convert to Data Frame
centroids2$county = rownames(centroids)
names(centroids2) <- c('clong', 'clat', "county") # Appropriate Header
centroids3 <- merge(centroids2, df_ohio_cases, by="county", all.x=TRUE, sort=FALSE)
centroids3$count <- ifelse(is.na(centroids3$count), 0, centroids3$count)
centroids3$label <- paste0(centroids3$county,": ", centroids3$count, " Cases")
g <- ggplot(centroids3, aes(x = clong, y = clat, group = 1,
text = paste0(county,
":
", count, " cases"),
))
g <- g + geom_polygon(data=my.df,
aes(x=long, y=lat, group=group, fill = count),
color="black", size = 0.2) +
geom_text(data = centroids3, aes(x = clong, y = clat, label = county), color = "black", size = 4)+
scale_fill_continuous(name="Confirmed Cases", low = "lightblue",
high = "darkblue",limits = c(0,max(my.df$count)), na.value = "grey50") +
labs(title="Confirmed Cases in Ohio") + theme(legend.position = "none", axis.title.x=element_blank(), axis.text.x=element_blank(),
axis.ticks.x=element_blank(), axis.title.y=element_blank(),
axis.text.y =element_blank(), axis.ticks.y=element_blank())
ggplotly(g, tooltip = "text") %>% layout(autosize = F, width = 1200, height = 800)
```
Distribution by Age
=======================================================================
Column {data-width=500}
---
### Distribution of Confirmed Cases by the Age Range
**We excluded `r length(which(df$Age_Range=="Unknown"))` people whose age is unknown.**
\
```{r}
# remove the cases for which the age range is "Unknown"
if (length(which(df$Age_Range=="Unknown"))==0){
df1 <- df
}else{
df1 <- df[-which(df$Age_Range=="Unknown"),]
}
df1$Age_Range <- factor(df1$Age_Range)
# find counts and relative counts (%) in each age range
Age_Dist <- table(df1$Age_Range, df1$Case_Count)
n <- sum(apply(Age_Dist, 1, function(x) sum(x*as.numeric(colnames(Age_Dist)))))
Age_Percent <- round(apply(Age_Dist, 1, function(x) sum(x*as.numeric(colnames(Age_Dist))))/n*100,2)
# form a data frame for the summary information of AGE
df_age <- data.frame(Age_Range=levels(df1$Age_Range), Percent_Cases=Age_Percent, text1=paste0(Age_Percent, "%"))
# obtatin the bar chart for the distribution of Ohio's confirmed cases by the Age Range
p_age <- plot_ly(df_age, x=~Age_Range, y=~Percent_Cases, type="bar",
text = df_age$text1, textposition = 'outside')%>% config(displayModeBar = F)
p_age <- p_age %>% layout(xaxis=list(title="Age Range"), yaxis=list(title="Percent of Cases"))
p_age %>% layout(autosize = F, width = 650, height = 650)
```
Column {data-width=500}
---
### Distribution of Death Cases by Age Range
**We excluded `r length(which(df$Age_Range=="Unknown"))` people whose age is unknown.**
\
```{r}
# find death counts and relative counts (%) in each age range
Age_Dist_Death <- table(df1$Age_Range, df1$Death_Count)
n <- sum(apply(Age_Dist_Death, 1, function(x) sum(x*as.numeric(colnames(Age_Dist_Death)))))
Age_Percent_Death <- round(apply(Age_Dist_Death, 1, function(x) sum(x*as.numeric(colnames(Age_Dist_Death))))/n*100,2)
# form a data frame for the summary information of AGE
df_age_death <- data.frame(Age_Range=levels(df1$Age_Range), Percent_Cases=Age_Percent_Death, text1=paste0(Age_Percent_Death, "%"))
# obtatin the bar chart for the distribution of Ohio's confirmed cases by the Age Range
p_age_death <- plot_ly(df_age_death, x=~Age_Range, y=~Percent_Cases, type="bar",
text = df_age_death$text1, textposition = 'outside')%>% config(displayModeBar = F)
p_age_death <- p_age_death %>% layout(xaxis=list(title="Age Range"), yaxis=list(title="Percent of Death Cases"))
p_age_death %>% layout(autosize = F, width = 650, height = 650)
```